Use of machine learning to identify protective factors for death from COVID-19 in the ICU: a retrospective study

Background Patients in serious condition due to COVID-19 often require special care in intensive care units (ICUs). This disease has affected over 758 million people and resulted in 6.8 million deaths worldwide. Additionally, the progression of the disease may vary from individual to individual, that is, it is essential to identify the clinical parameters that indicate a good prognosis for the patient. Machine learning (ML) algorithms have been used for analyzing complex medical data and identifying prognostic indicators. However, there is still an urgent need for a model to elucidate the predictors related to patient outcomes. Therefore, this research aimed to verify, through ML, the variables involved in the discharge of patients admitted to the ICU due to COVID-19. Methods In this study, 126 variables were collected with information on demography, hospital length stay and outcome, chronic diseases and tumors, comorbidities and risk factors, complications and adverse events, health care, and vital indicators of patients admitted to an ICU in southern Brazil. These variables were filtered and then selected by a ML algorithm known as decision trees to identify the optimal set of variables for predicting patient discharge using logistic regression. Finally, a confusion matrix was performed to evaluate the model’s performance for the selected variables. Results Of the 532 patients evaluated, 180 were discharged: female (16.92%), with a central venous catheter (23.68%), with a bladder catheter (26.13%), and with an average of 8.46- and 23.65-days using bladder catheter and submitted to mechanical ventilation, respectively. In addition, the chances of discharge increase by 14% for each additional day in the hospital, by 136% for female patients, 716% when there is no bladder catheter, and 737% when no central venous catheter is used. However, the chances of discharge decrease by 3% for each additional year of age and by 9% for each other day of mechanical ventilation. The performance of the training data presented a balanced accuracy of 0.81, sensitivity of 0.74, specificity of 0.88, and the kappa value was 0.64. The test performance had a balanced accuracy of 0.85, sensitivity 0.75, specificity 0.95, and kappa value of 0.73. The McNemar test found that there were no significant differences in the error rates in the training and test data, suggesting good classification. This work showed that female, the absence of a central venous catheter and bladder catheter, shorter mechanical ventilation, and bladder catheter duration were associated with a greater chance of hospital discharge. These results may help develop measures that lead to a good prognosis for the patient.


269.500000
Non-dialysis chronic renal failure     The association rules were obtained by the apriori algorithm (ZHAO; ZHANG; CAO, 2009, p. 4) implemented by the arules library (HAHSLER, 2021) in R (TEAM, 2013).To use this algorithm it is necessary to convert the data set to a transaction database and the continuous variables need to be discretized, that is, dividing the values into intervals.Thus, age was divided into three categories: 0 to 55 years, 56 to 68 years and 69 to 95 years, length of hospital stay from 1 to 8, 9 to 17 and 18 to 522; highest 1h creatinine below 0.6, greater than or equal to 0.6 and less than 1 and from 1 to 28; BUN less than 21.5, greater than or equal to 21.5 and less than 34.7 and from 34.7 to 204; duration of mechanical ventilation

Association Rules
Figure 1 presents the Pearson correlation analysis.Variables with a correlation greater than 0.75 were eliminated.## precision = 3 ## main = Unused

Mined association rules
The graph below represents the rules and their respective confidence levels, it can be generated interactively where we can hover the mouse and see the best rules (in this case the ideal ones based on confidence since we sort the data before creating the graph) .If it is interesting for your research, just let me know.

Cobert
Table 5 highlights sets of characteristics that are associated with patients' Discharge.For example, in rule 2 we can see that there is an association between a urinary bladder probe (false) and a MAP catheter (false) with Discharge.For this association rule we have a support of 0.077, which represents 37 (count) patients (7.7% of the total).The confidence column represents the confidence of this rule, where 0.9736 represents 97.36%%, that is, 97.36% of patients with a urinary bladder catheter (false) and MAP catheter (false) had Discharge.Furthermore, the lift (similar to odds) is 2.7978, this means that there is a 179.78% greater chance of having a Discharge.

Figure 2 .
Figure 2. Association rules and trust level Figure 2. Standardized waste

Table 1 .
Zero and almost zero variance

Table 1 .
Zero and almost zero variance

Table 1 .
Zero and almost zero variance

Table 1 .
Zero and almost zero variance

Table 1 .
Zero and almost zero varianceIn Table2, we present the missing values per variable.Note that some variables have more than All variables with more than 50% missing values were eliminated.

Table 2 .
Missing values per variable

Table 2 .
Missing values per variable

Table 2 .
Missing values per variable

Table 3
presents the information gain per variable.The variables that stand out the most are the use and duration of mechanical ventilation.Additionally, approximately 37% of the predictor variables show no information gain.Those with null information gain were eliminated

Table 3 .
Information gain per variable

Table 3 .
Information gain per variable

Table 4 .
Area under the curve by variable

Table 4 .
Area under the curve by variable

Table 1 .
Transaction database 0 to 3, 4 to 11 and 12 to 60.The discretizations were made in order to have approximately the same number of samples in each class.Table1presents the first two transactions.
Source: prepared by the author.

Table 5 .
Factors associated with the Discharge outcome

Table 6 .
Factors associated with the unit outcome being Death

Table 6 .
Cutoff point adjustmentAccording to Table6, the five most important variables for the Randon Forest model are time of use of mechanical ventilation, length of stay in hospital, age, BUN and highest 1h creatinine, however, the other variables, albeit on a smaller scale, contribute to explain the outcome.

Table 6 .
Cutoff point adjustment